Overview: Script for water temperature data from water stations. Sources and information on data below. First I remove data points that are flagged as failed and suspect. Then I use a rolling mean to remove data points +/- 2 standard deviations from the 2hr rolling mean. Finally I calculate hourly mean since data was collected at different frequencies. This will set me up to do analysis between sites and with field data. Sites graphed from cloest to the delta to the mouth of the bay.
Notes on data:
-Accessed and downloaded all data on September 30, 2020 -Downloaded all data available from 1/1/2017-12/31/2019. -China camp http://cdmo.baruch.sc.edu/aqs/zips.cfm Data logged every 15 minutes. Requested citation format: NOAA National Estuarine Research Reserve System (NERRS). System-wide Monitoring Program. Data accessed from the NOAA NERRS Centralized Data Management Office website: http://cdmo.baruch.sc.edu/; accessed 30 September 2020. -Richardson Bay/NERR data. Data logged every 15minutes. -EOS https://oceanview.pfeg.noaa.gov/erddap/tabledap/rtcctdRTCysi.html. Data logged every 6 minutes. time in UTC. Should I use this link? http://erddap.cencoos.org/erddap/tabledap/tiburon-water-tibc1.html. ph and salinity available. No air temperature. -Point Fort http://erddap.cencoos.org/erddap/tabledap/fort-point.html?time%2Csea_water_temperature%2Csea_water_temperature_qc_agg%2Cz&time%3E%3D2020-09-20T21%3A07%3A41Z&time%3C%3D2020-09-29T12%3A00%3A00Z. time in UTC. Data logged every 6 minutes.
Next steps: -fix the aesthetics of the graphs. Font too small, change gradient, clean up keys, ect -analysis with field data -site characterization summary table
Set up
rm(list=ls())
library(tidyverse)
library(ggpubr)
library(scales)
library(chron)
library(plotly)
library(taRifx)
library(aweek)
library(easypackages)
library(renv)
library(here)
library(ggthemes)
library(gridExtra)
library(patchwork)
library(tidyquant)
library(recipes)
library(cranlogs)
library(knitr)
library(openair)
Make a custom function to calculate mean, standard deviation (sd), 95% confidence interval where x is a numeric vector. Here “hi” and “lo” are defined as 2 strandard deviations away from mean. This creates a functin so you don’t need to change anything here/this step does not manipulate the data. You can change the equation to define the hi and low as 3 or however many sds away you would like. na.rm=TRUE removes any “NA” values. “ret” are the new columns it’ll add when you run this on a df.
custom_stat_fun_2<-function(x, na.rm=TRUE){
m<-mean(x, na.rm=TRUE)
s<- sd(x, na.rm=TRUE)
hi<-m+2*s
lo<-m-2*s
ret<-c(mean=m, stdev=s, hi.95=hi, lo.95=lo)
}
China Camp
Read in data. Subset (not needed but it’s a large df). Needs a column named “date” with no time stamp for step later.
cc<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/china_camp_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
cc$date<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y"))
cc$datetime<-as.POSIXct(cc$DateTimeStamp, format = c("%m/%d/%Y %H:%M"))
cc<-subset(cc, select=c(date, datetime,Temp, F_Temp))
cc%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from China Camp: All data points",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 847 rows containing missing values (geom_point).
Remove data that failed QC test
cc<-cc[- grep("-3", cc$F_Temp),]
cc%>%
ggplot(aes(x=date, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from China Camp: Failed QC points removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).
Remove data points flagged as “suspect” too
cc<-cc[- grep("1", cc$F_Temp),]
cc%>%
ggplot(aes(x=date, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from China Camp: Failed & suspect QC points removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 835 rows containing missing values (geom_point).
Roll apply using custom stat function. Needs a column named “date” with no time stamp in order to work. Since data was collected every 15min, 8 observations=2hour window
cc<-cc%>%
tq_mutate(
select= Temp,
mutate_fun= rollapply,
width= 8,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter to remove calues that are +/- 2 standard deviations from the rolling mean
cc<-filter(cc, Temp <hi.95 & Temp >lo.95)
cc%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from China Camp: Failed QC points removed + +/- 2sd removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 12 rows containing missing values (geom_point).
Hourly median
For timeAverage function to work, you need a date or POSIXct column named “date” but it needs the timestamp to work so we’ll add it back in. Interval = frequency of data collection.
cc<-subset(cc, select=-c(date))
cc$date<-as.POSIXct(cc$datetime, format = c("%m/%d/%Y %H:%M"))
cc.1hr.med<- timeAverage(cc,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 12 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
cc.1hr.med<-subset(cc.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
cc_temp<-cc.1hr.med %>% ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+ylim(5,35)+
labs(title="Hourly median water temperature from China Camp",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
cc_temp
## Warning: Removed 921 rows containing missing values (geom_point).
Format and save
cc.1hr.med$date<-as.Date(cc.1hr.med$datetime, format = c("%Y-%m-%d"))
names(cc.1hr.med)[names(cc.1hr.med) == "Temp"] <- "water_temp"
write.csv(cc.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/cc_watertemp.csv")
EOS pier
Set up
eos<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_bouy_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
eos$date<-as.Date(eos$date, format = c("%m/%d/%Y"))
eos$datetime<-as.POSIXct(eos$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))
eos<-subset(eos, select=c(date, datetime, temperature))
eos%>%
ggplot(aes(x=datetime, y=temperature, color=temperature))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from EOS resesarch pier: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Looks good except for that set of high values at the end of 2018. This corresponds with the suspicious data in salinity. Likely bad data that needs to be removed.
Since this data set doesn’t have a QC test column, I’ll remove values that are far outside expected salinity values in Tiburon.
eos<-filter(eos, temperature >0 & temperature <22)
eos%>%
ggplot(aes(x=datetime, y=temperature, color=temperature))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from EOS resesarch pier: Values outside of expected range removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Roll apply using custom stat function. Data collected every 6 minutes so 20 observations=2 hours. Needs a column named “date” with no time stamp in order to work.
2hr
eos<-eos%>%
tq_mutate(
select= temperature,
mutate_fun= rollapply,
width= 20,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter and graph
eos<-filter(eos, temperature <hi.95 & temperature >lo.95)
ggplot(data = eos, mapping = aes(x = datetime, y = temperature, color=temperature)) + geom_point()+
labs(title="Water temperature data from EOS resesarch pier: Extreme values & +/- 2sd data removed ",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Looks pretty good
Hourly median
eos<-subset(eos, select=-c(date))
names(eos)[names(eos) == "datetime"] <- "date"
eos$date<-as.POSIXct(eos$date, format = c("%Y-%m-%dT%H:%M:%SZ"))
eos.1hr.med<- timeAverage(eos,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
eos.1hr.med<-subset(eos.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
eos_temp<-eos.1hr.med %>% ggplot(aes(x=date, y=temperature, color=temperature))+
geom_point(alpha=0.5)+ylim(5,35)+
labs(title="Hourly median ph from EOS resesarch pier",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
eos_temp
## Warning: Removed 513 rows containing missing values (geom_point).
Looks good. Not sure about those few points at the end of 2018. Again, probably bad data that I need to remove earlier on.
Format and save
names(eos.1hr.med)[names(eos.1hr.med) == "date"] <- "datetime"
eos.1hr.med$date<-as.Date(eos.1hr.med$datetime, format = c("%Y-%m-%d"))
names(eos)[names(eos) == "temperature"] <- "water_temp"
write.csv(eos.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/eos_watertemp.csv")
Richardson Bay
Set up
rb<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/richardson_bay_all.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
rb$date<-as.Date(rb$DateTimeStamp, format = c("%m/%d/%y"))
rb$datetime<-as.POSIXct(rb$DateTimeStamp, format = c("%m/%d/%y %H:%M"))
rb<-subset(rb, select=c(date, datetime, Temp, F_Temp, DateTimeStamp))
rb%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Richardson Bay: All data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Remove flagged data that did not pass QAQC
rb<-rb[!grepl("-3",rb$F_Temp),]
rb%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Richardson Bay: Failed QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Remove data points flagged as “suspect”
rb<-rb[!grepl("1",rb$F_Temp),]
rb%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature from Richardson Bay: Failed & suspect QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 1508 rows containing missing values (geom_point).
Roll apply using custom stat function. Data collected every 15 min so 2hrs=8 observations
rb<-rb%>%
tq_mutate(
select= Temp,
mutate_fun= rollapply,
width= 8,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
Filter
rb<-filter(rb, Temp <hi.95 & Temp >lo.95)
rb%>%
ggplot(aes(x=datetime, y=Temp, color=Temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Richardson Bay: Failed & suspect QC data + -/+2sd removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
## Warning: Removed 5 rows containing missing values (geom_point).
Hourly median
rb<-subset(rb, select=-c(date))
rb$date<-as.POSIXct(rb$datetime, format = c("%m/%d/%y %H:%M"))
rb.1hr.med<- timeAverage(rb,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "15 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 5 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
rb_temp<-rb%>%
ggplot(aes(x=date, y=Temp, color=Temp))+
geom_point(alpha=0.5)+ylim(5,35)+
labs(title="Hourly median water temperature data from Richardson Bay",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of NERR")
rb_temp
## Warning: Removed 5 rows containing missing values (geom_point).
Format and save
rb.1hr.med<-subset(rb.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
names(rb.1hr.med)[names(rb.1hr.med) == "Temp"] <- "water_temp"
rb.1hr.med$date<-as.Date(rb.1hr.med$date, format = c("%Y-%m-%d"))
write.csv(rb.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/rb_watertemp.csv")
Fort Point
Set up
fp<-read.csv("C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fort_point_r.csv", header = TRUE, sep=",", fileEncoding="UTF-8-BOM", stringsAsFactors = FALSE)
fp$date<-as.Date(fp$date, format = c("%Y-%m-%d"))
names(fp)[names(fp) == "sea_water_temperature"] <- "temp"
names(fp)[names(fp) == "sea_water_temperature_qc_agg"] <- "F_temp"
fp<-subset(fp, select=c(datetime, date, temp, F_temp))
fp%>%
ggplot(aes(x=date, y=temp, color=temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Fort Point: All Data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Plot with a better view
fp%>%
ggplot(aes(x=date, y=temp, color=temp))+
geom_point(alpha=0.5)+ylim(0,20)+
labs(title="Water temperature data from Fort Point: All Data",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
## Warning: Removed 4 rows containing missing values (geom_point).
Quite a few points around 0 that don’t look right.
Remove flagged data, 4 indicates fail.
fp<-filter(fp, F_temp!=4)
fp%>%
ggplot(aes(x=date, y=temp, color=temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Fort Point: Failed QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Remove suspect data
fp<-filter(fp, F_temp!=3)
fp%>%
ggplot(aes(x=date, y=temp, color=temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Fort Point: Failed & suspect QC data removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Still has those points around 0 which just do not look right to me. Hopefully they’re removed with the next few steps.
Roll apply using custom stat function. Data collected every 6 minutes so 2hrs = 20 observations
fp<-fp%>%
tq_mutate(
select= temp,
mutate_fun= rollapply,
width= 20,
align= "right",
by.column= FALSE,
FUN= custom_stat_fun_2,
na.rm= TRUE)
## Warning in zoo(y, order.by = index(x), ...): some methods for "zoo" objects do
## not work if the index entries in 'order.by' are not unique
fp<-filter(fp, temp <hi.95 & temp >lo.95)
fp%>%
ggplot(aes(x=date, y=temp, color=temp))+
geom_point(alpha=0.5)+
labs(title="Water temperature data from Fort Point: Failed & suspect QC + +/-2sd + <1 removed",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
Hourly median
fp<-subset(fp, select=-c(date))
fp$date<-as.POSIXct(fp$datetime, format = c("%Y-%m-%dT%H:%M:%SZ"))
fp.1hr.med<- timeAverage(fp,
avg.time="1 hour",
data.thresh = 0,
statistic = "median",
interval = "6 min",
remove.calm = FALSE,
type = "default")
## Warning: Missing dates detected, removing 29 lines
## Warning in checkPrep(mydata, vars, type = "default", remove.calm = FALSE, :
## Detected data with Daylight Saving Time, converting to UTC/GMT
fp_temp<-ggplot(data = fp.1hr.med, mapping = aes(x = date, y = temp, color=temp)) + geom_point()+ylim(5,35)+
labs(title="Hourly median water temperature data from Fort Point",
subtitle="01/01/2017 - 12/31/2019",
caption= "data courtesy of CeNCOOS")
fp_temp
## Warning: Removed 3329 rows containing missing values (geom_point).
Most of the zeros were removed. Don’t like those two points that stick out early on. Format and save
fp.1hr.med<-subset(fp.1hr.med, select=(-c(mean, stdev, hi.95, lo.95)))
names(fp.1hr.med)[names(fp.1hr.med) == "date"] <- "datetime"
fp.1hr.med$date<-as.Date(fp.1hr.med$datetime, format = c("%Y-%m-%d"))
names(fp.1hr.med)[names(fp.1hr.med) == "temp"] <- "water_temp"
write.csv(fp.1hr.med, "C:/Users/chels/Box Sync/Thesis/Data/Working data/Bouy data/fp_watertemp.csv")
Graph together
all_temp<-ggarrange(cc_temp, eos_temp, rb_temp, fp_temp, nrow=4, ncol=1)
## Warning: Removed 921 rows containing missing values (geom_point).
## Warning: Removed 513 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 3329 rows containing missing values (geom_point).
all_temp